Feature/skills aggregator: add SkillEvalAggregator for batch evaluation comparisons by venkatkrish543re · Pull Request #229 · strands-agents/evals

venkatkrish543re · 2026-05-14T01:23:34Z

No description provided.

Adds skills/ subpackage providing paired-comparison aggregation for evaluating agent skills against a baseline. Mirrors the chaos aggregator pattern from feature/aggregator-demo-2. - SkillEvalAggregator with Wilcoxon, paired-t, and McNemar tests - Bootstrap CI on the mean delta (1000 resamples) - Corruption filtering before paired statistics - SkillEvalExperiment composes base Experiment - Rich-based interactive display - 44 unit tests covering paired stats, corruption filtering, pairing, and serialization Closes strands-agents#228

ybdarrenwang and others added 9 commits May 6, 2026 19:21

first implement of chaos module

fd25de4

fix tool output corruption

93f9087

refactor with contextvar

db1b58c

draft chaos aggregator and display

9e64fe4

add resilience evaluators

e02d02c

improve display and interface

8818ac7

fix rebase

98fc6a5

implement new display

3ef6c47

venkatkrish543re had a problem deploying to manual-approval May 14, 2026 01:23 — with GitHub Actions Failure

yonib05 added area-evaluators Evaluators: output, trajectory, tool use, interactions, and LLM-as-judge quality metrics enhancement New feature or request labels Jun 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/skills aggregator: add SkillEvalAggregator for batch evaluation comparisons#229

Feature/skills aggregator: add SkillEvalAggregator for batch evaluation comparisons#229
venkatkrish543re wants to merge 9 commits into
strands-agents:mainfrom
venkatkrish543re:feature/skills-aggregator

venkatkrish543re commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

venkatkrish543re commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

venkatkrish543re commented May 14, 2026 •

edited

Loading